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Diagnostic  Judgment  as  a  Function  of  the  Pre-Processing  of  Evidence 
LEE  FRIEDMAN,  WILLIAM  C.  HOWELL,  and  CARY  R.  JENSEN,  Rice 
University,  Houston,  Texas 


Two  experiments  were  conducted  to  determine  how  the  quality 
of  a  human  judgment  (in  this  case,  military  threat 
diagnosis)  is  affected  by  various  levels  of  pre-processing 
applied  to  the  raw  predictive  events  when  such  processing  is 
carried  out  by  the  human  and  by  a  machine  "aid."  The 
subject  was  required  to  estimate  the  threat  of  attack  on  the 
friendly  position  (criterion)  posed  by  levels  of  activity 
observed  in  various  enemy  positions  (cues).  These  enemy 
positions  differed  in  the  degree  of  potential  threat  that 
they  posed.  Overall  threat  judgments  were  made  under 
conditions  in  which  a  prior  overt  estimate  of  position 
activity  levels  was  or  was  not  required.  Machine-aiding 
conditions  were  as  follows;  1)  no  aiding,  where  the  subject 
simply  observed  raw  events  in  "real  time"  (Experiment  2), 

2)  automatic  (Experiment  1  &  2)  or  self  (Experiment  1) 
tabulation  of  events,  and  3)  automatic  computation  of  events 
(Experiment  2).  Finally,  the  rate  of  event  occurrences  was 
manipulated  (Experiment  2).  {When  subjects  made  overall 
criterion  judgments  (threat  evaluation)  intuitively  on  the 
basis  of  events  observed  in  "real  time",  their  performance 
improved  markedly  by  interposing  cue  estimation,  even  if  cue 
estimation  was  fairly  inaccurate.  If  events  were  computed 
automatically,  permitting  a  more  "analytic"  threat  judgment, 
performance  improved  and  the  redundant  estimation  step  was 
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not  helpful.  If  events  were  merely  tabulated,  estimation 
was  helpful,  but  to  an  extent  midway  between  the  raw- 
observation  and  automatic  computation  conditions. 


Requests  for  reprints  should  be  sent  to  Lee  Friedman, 
Psychology  Department,  Rice  University,  P.  0.  Box  1892, 
Houston,  TX  77251. 

Running  Title:  DIAGNOSTIC  JUDGMENT 

Key  Words:  diagnostic  judgment,  pre-processing  of  evidence 
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INTRODUCTION 

A  common  task  in  military,  medical,  business  and  most 
other  decision  systems  is  that  of  diagnosing  the  aggregate 
meaning  of  a  succession  of  equivocal  predictive  events  — 
test  results,  reports,  indexes,  observations.  For  example, 
the  physician  examines  the  patient's  medical  history, 
presenting  symptoms,  and  test  results  in  forming  a  medical 
opinion;  the  businessperson  weighs  economic  indices,  cost 
projections,  and  market  analyses  in  judging  the  potential  of 
a  new  product;  the  commander  evaluates  a  stream  of 
intelligence  information  in  estimating  the  threat  posed  by 
an  enemy  force. 

With  the  evolution  of  sophisticated  technologies  for 
obtaining  and  processing  such  predictive  data,  the  demands 
on  the  human  decision  maker  have  grown,  as  have  the 
possibilities  for  automating  some  or  all  of  the  component 
functions  (Schrenk,  1969;  Slovic,  1981).  In  fact,  use  of 
so-called  "decision  aids"  —  particularly  in  diagnosis  — 
has  become  fairly  common  in  contexts  as  varied  as 
professional  sports,  medicine,  business,  and  military  C^I 
systems  (Sage,  1981;  Wohl,  1981). 

Despite  these  advances,  however,  the  question  of  how  best 
to  allocate  decision  functions  between  man  and  machine  is 
still  unresolved  (Slovic,  Fischoff,  S>  Lichtenstein, 

Part  of  the  problem  lies  in  our  lack  of  understanding  of 
exactly  how  human  capabilities,  task  demands,  and  decision 
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quality  are  related.  True,  a  mass  of  research  has  appeared 
over  the  last  decade  exposing  various  forms  of  human 
"nonoptimality"  (Kahneman,  Slovic,  and  Tversky,  1982; 

Tversky  &  Kahneman,  1974),  but  it  remains  to  be  seen  how 
general  these  "biases”  are  and  to  what  extent  they  degrade 
performance  on  actual  decision  problems  (Cohen,  1979; 

Einhorn  &  Hogarth,  1981;  Hogarth,  1980).  Most  of  the 
research  has  dealt  with  a  particular  facet  of  judgment  or 
choice  in  isolation,  using  whatever  task  seemed  most 
appropriate  for  that  particular  function.  Thus,  for 
example,  strings  of  numbers  or  other  events  have  been  used 
to  assess  frequency/probability  estimation  (Erlick,  1964); 
the  classical  urn-and-balls  or  bookbag-and-poker-chips 
problem  has  been  a  favorite  Bayesian  inference  paradigm 
(Edwards,  1968;  Peterson  &  Beach,  1967);  general  knowledge 
items  have  been  used  to  study  confidence  in  judgment 
(Slovic,  Fischoff,  &  Lichtenstein,  1976);  numerical  values 
attached  to  predictive  "cues"  have  been  preferred  in 
policy-capturing  and  multiple-cue-probability-learning 
research  (Hammond,  McClelland,  S<  Mumpower,  1980;  Kerkar, 
1983);  and  carefully  structured  lotteries  have  been  the  main 
vehicle  for  studying  choice  behavior  (Payne,  Laughhunn,  i 
Crum,  1982;  Tversky  &  Kahneman,  1981). 

In  their  natural  habitat,  of  course,  decision  problems 
are  not  conveniently  structured  into  these  elements. 

Rarely,  for  example,  does  a  personnel  officer  choose  - zb 
candidates  merely  by  aggregating  a  set  of  "cue’’  or  credit 


tor 
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scores  (as  in  policy  capturing);  more  likely,  he/she  uses 
such  "processed"  data  in  conjunction  with  raw  observations 
covering  some  of  the  same  characteristics  and  others  derived 
from  interviews,  reference  checks,  and  work  history.  Thus 
it  is  hard  to  say  how  the  well-established  inferiority  of 
man  to  model  in  developing  and  applying  a  cue-weighting 
strategy  (Dawes  &  Corrigan,  1974;  Goldberg,  1970)  will 
affect  the  actual  quality  of  candidate  selection. 

Similarly,  a  military  commander  may  well  be  subject  to 
biases  associated  with  the  heuristic  estimation  of  event 
probability  (Sage,  1981;  Wohl,  1981);  yet  in  practice, 
he/she  may  rarely  make  overt  estimates,  and  the  question  of 
whether  such  biases  will  seriously  affect  his/her  ultimate 
diagnosis  or  action  cannot  be  directly  answered.  In  a  word, 
we  have  difficulty  translating  the  available  data  on  human 
cognitive  limitations  into  decision  system  recommendations 
because  we  do  not  know  (1)  how  paradigm-specific  the 
limitations  are,  (2)  how  many  of  the  basic  cognitive 
processes  actually  occur  in  any  particular  decision  problem 
or  (3)  how  such  processes,  if  they  occur,  act  and  interact 
to  affect  system  output. 

What  we  do  know  is  that  human  judgment  and  decision 
making  is  subject  to  a  variety  of  subtle,  formally 
.rrelevant  task  influences  (Einhorn  &  Hogarth,  1981; 

Hammond,  1981;  Howell  &  Burnett,  1978;  Kahneman  i  Tversky. 
1979).  Further,  it  appears  that  merely  requiring  the 
decision  maker  to  perform  certain  processing  steps  'such  3s 
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overt  frequency  estimation)  on  the  way  to  a  terminal 
response  (such  as  diagnosis  or  action  selection)  can  itself 
influence  the  quality  of  the  output  (Howell  S>  Kerkar,  1982). 
In  view  of  these  considerations,  it  would  seem  useful  to 
study  the  issue  of  function  allocation  in  a  more 
comprehensive  fashion  than  has  typically  been  done,  using  a 
task  comprising  more  than  a  single  facet  of  the  decision 
process.  The  present  studies  represent  a  start  in  this 
direction. 

The  purpose  of  the  two  experiments  reported  below  was  to 
determine  how  the  quality  of  a  human  judgment  (in  this  case, 
military  threat  diagnosis)  is  affected  by  various  levels  of 
pre-processing  applied  to  the  raw  predictive  events  when 
such  processing  is  carried  out  by  the  human  and  by  a  machine 
"aid."  In  essence  che  paradigm  extends  the  standard 
"policy-capturing"  task  to  a  situation  in  which  the  "cue 
values"  (processed  predictors)  are  derived  from  a  more 
fundamental  set  of  events  ( raw  observations)  by  man, 
machine,  or  a  combination.  More  specifically,  the  subject 
is  required  to  estimate  the  threat  of  attack  on  his  position 
(criterion)  posed  by  levels  of  activity  observed  in  various 
enemy  positions  (cues).  The  activity  levels,  however,  are 
themselves  a  direct  reflection  of  the  rate  of  observed 
events  over  time.  Thus  with  automated  pre-processing  of 
cues  (activity  leve  s)  the  task  becomes  a  straight  policy¬ 
capturing  paradigm;  with  total  manual  processing,  it  becomes 
a  typical  "intuition"  task;  with  manual  pre-processing,  it 
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becomes  a  structured,  two-stage  judgment  task.  Using  this 
approach  it  was  possible  to  examine  directly  the  quality  of 
the  overall  judgments  as  well  as  the  various  subprocesses 
involved  in  each  functional  allocation. 

EXPERIMENTS 

Two  studies  were  carried  out  using  essentially  the  same 
task  and  paradigm.  Both  involved  (between-groups) 
comparison  of  overall  threat  judgments  made  under  conditions 
in  which  an  overt  estimate  of  position  activity  levels  was 
required  (estimation  groups )  or  was  not  required  (no 
estimation  groups )  for  identical  sets  of  raw  observations 
(citings).  Both  studies  also  included  a  between-groups 
"aiding"  manipulation.  In  Experiment  1  the  aiding 
manipulation  concerned  whether  event  citings  were  tabulated 
automatically  or  whether  subjects  had  to  press  particular 
keys  to  tabulate  the  events  (automatic  tabulation  vs.  self- 
tab1-1  ation) .  In  Experiment  2  the  aiding  manipulation 
consisted  of  three  conditions;  1)  no  aiding  (subjects 
simply  observed  raw  events  in  real  time),  2)  automatic 
tabulation,  as  above,  and  3)  automated  computation  of  cue 
values.  And  finally,  a  within-groups  manipulation  ( rate  of 
citings)  was  incorporated  into  the  second  study. 

In  view  of  the  similarities  between  the  two  studies,  all 
common  methodological  features  will  be  described  here,  and 
any  unique  features  will  be  noted  in  the  subsequent 
description  of  the  individual  experiments. 
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Subjects.  A  total  of  150  Rice  University  undergraduate 
students  volunteered  to  participate  in  exchange  for  course 
credit  or  pay  ($4.00  per  hour).  The  first  60  of  these  were 
assigned  randomly  to  the  groups  comprising  Experiment  1?  the 
remaining  90  were  assigned  likewise  to  the  six  groups  of 
Experiment  2.  All  groups  in  both  studies,  therefore, 
consisted  of  15  subjects  apiece. 

Apparatus  and  procedure.  Subjects  served  individually 
for  a  single  session  which  lasted  approximately  one  hour. 
During  this  time  they  completed  20  problems,  each  of  which 
consisted  of  a  series  of  citings  obtained  over  a  several 
minute  period  from  four  hypothetical  enemy  locations.  Each 
problem  terminated  with  the  subjects’  evaluation  of  overall 
threat  posed  for  that  problem.  The  entire  experiment  was 
programmed  on  a  TRS-80  Model  III  microprocessor  which  was 
set  up  in  a  small  experimental  booth.  Citings  were 
displayed  as  flashing  digits  ”1",  "2",  "3",  or  "4",  each  of 
which  appeared  in  one  of  four  respective  areas  of  the  CRT, 
the  latter  representing  the  four  enemy  positions.  Depending 
upon  the  experimental  conditions,  citings  were  sometimes 
preserved  on  the  CRT  for  the  duration  of  a  problem  as  small 
squares.  These  squares  disappeared  at  the  end  of  che 
problem  when  subjects  were  instructed  to  assess  enemy 
readiness  and/or  threat.  Responses,  which  were  made  via 
designated  keys  on  the  keyboard,  were  recorded 
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Task.  The  instructions  informed  subjects  that  they  were 
to  serve  as  military  intelligence  officers  responsible  for 
monitoring  activity  in  four  regions  controlled  by  enemy 
forces  and  for  evaluating  the  overall  threat  posed  to 
friendly  forces.  Enemy  regions  were  designated  according  to 
their  suitability  as  sites  from  which  to  launch  an  attack: 
Region  1  was  the  most  suitable;  Region  4,  the  least. 

Activity  was  defined  in  terms  of  citings  yielded  by  combined 
survaillance  systems:  in  Experiment  1,  for  example,  0-4 
citings  per  region  over  the  course  of  a  problem  was 
considered  normal  under  peaceful  conditions,  5-9  was 
moderate  and  could  represent  a  build-up  in  readiness  for 
attack,  10-14  was  high  and  indicative  of  a  significant 
build-up.  Thus  the  subject  was  to  consider  both  the 
activity  observed  (cue  value)  and  the  prior  suitability  of 
location  (importance  weight)  in  evaluating  threat  posed  by 
any  region;  overall  threat  was  the  aggregate  for  all  four 
regions. 

In  the  course  of  a  problem,  the  subject  would  see 
anywhere  from  0  to  56  citings  distributed  across  the 
regions.  Distribution  was  varied  over  problems  such  that 
normal ,  moderate  and  high  activity  levels  occurred  in  each 
region  with  equal  frequency. 


At  the  end  of  each  problem,  the  subject  was  required  to 
estimate  overall  threat  posed  on  a  scale  of  1  (no  threat)  to 
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10  (attack  imminent).  In  addition,  he/she  made  an  all-or- 
none  "war-peace"  judgment  following  everything  else,  or-at 
any  time  during  a  problem  when  the  perceived  threat  exceeded 
5  on  the  overall  scale.  The  "war-peace"  feature  was 
included  primarily  to  encourage  subjects  to  remain  cognizant 
throughout  the  problem  of  their  role  as  aggregator  as  well 
as  monitor,  and  generally  to  help  maintain  interest.  No 
explicit  cost-payoff  scheme  was  attached  to  it.  In  fact, 
instructions  clearly  emphasized  that  the  numerical  threat 
rating  was  the  subjects'  principal  responsibility. ~ 

As  noted  earlier,  one  variable  of  interest  was  the  presence  or 
absence  of  a  "cue-value"  estimation  requirement.  In  this  task, 
activity  level  was  the  primary  cue,  hence  frequency  of  citings 
(normal,  moderate,  or  high)  constituted  the  estimation 
requirement  for  those  conditions  where  it  applied.  Therefore, 
estimation  groups  judged  activity  level  for  each  region  just 
prior  to  their  overall  threat  evaluation,  whereas  no-est imat ion 
groups  simply  rated  overall  threat. 

An  objective  threat  index  was  computed  for  each  problem  by 
simply  weighting  each  region's  importance  (1-4)  by  the  number  of 
programmed  citings  and  summing  over  the  four  regions.  Similarly, 
of  course,  an  objective  activity  index  was  available  in  the 


l  All-or-none  data  were  analyzed  but,  since  they  yielded  no 
information  other  than  that  reflected  in  the  more  precise 
ratings,  they  will  not  be  discussed  further. 
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number  of  actual  citings  at  each  location.  Using  these  measures 
it  was  possible  to  calculate  the  accuracy  of  both  kinds  of 
judgments  as  well  as  the  all-or-none  "war-peace"  response.  In 
addition,  by  regressing  threat  evaluations  (criterion  values)  on 
activity  levels  (cue  values)  it  was  possible  to  derive  estimates 
of  the  subjective  importance  accorded  each  region  (i.e.  b- 
weights),  and  by  comparing  these  weights  to  the  assigned  (1-4) 
values  it  was  possible  to  evaluate  the  subject’s  weighting 
policies . 

EXPERIMENT  1 

Method 

In  this  study,  the  principal  questions  were  whether  overt 
estimation  of  readiness  at  the  four  locations  enhances  aggregate 
threat  evaluation,  and  whether  automatic  tabulation  adds  to  that 
enhancement  any  more  than  self-tabulation  does.  The  former 
manipulation  was  described  previously.  In  the  automatic 
tabulation  condition,  each  event  (represented  by  a  digit)  flashed 
on  the  CRT  for  .25  second,  and  was  replaced  by  a  small  square 
that  remained  on  the  screen  during  the  problem  until  subjects 
were  instructed  to  assess  readiness  and/or  threat.  In  the  self- 
tabulation  condition,  subjects  had  to  press  a  particular  key 
("1",  "2",  "3",  or  "4",  depending  upon  the  region  where  the  event 
occured)  after  each  event  in  order  to  have  it  tabulated 
(preserved  as  a  square).  The  design  was  a  simple  2X2  factorial 
combination  of  these  variables  using  four  groups  of  15  subjects 
each.  The  actual  citing  frequencies  used  in  each  region  during  a 
problem  were  drawn  randomly  from  normal  distributions  over  the 
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ranges  0-4  (mean  =  2),  5-9  (mean  =  7),  and  10-14  (mean  =  12)  for 


"normal",  "moderate",  and  "high"  readiness  respectively. 

Results  and  Discussion 

Correlations  of  threat  ratings  with  objective  values  are  shown 
in  Table  1. 


Insert  Table  1  about  here 


A  between-groups  analysis  of  variance  revealed  a  marginally 
significant  estimation  effect,  _F  (  1,  56)  =  3.58,  _p  =  0.06,  but 
neither  aiding  nor  its  interaction  with  estimation  approached 
significance,  _F  (1,  56)  <  1.0. 

The  findings  using  the  more  process-oriented  "policy¬ 
capturing"  measure,  while  consistent  with  the  accuracy  index, 
were  a  bit  more  clear-cut  as  shown  in  Table  2.  The  b-weights 
obtained  under  estimation  conditions  were  considerably  closer  to 
the  optimal  values  over  the  four  regions  than  were  those  yielded 
by  judgments  made  directly  from  observations  ( no-est imation 
conditions).  In  this  case,  a  MANOVA  was  the  appropriate 
statistical  test,  and  the  Hotelling-Lawley  trace  was  used  to 
approximate  F.  Here,  the  main  effect  of  estimation  was 
significant,  _F  (4,  53)  =  2.76,  _p  <  0.04,  and  again,  neither 
aiding  nor  its  interaction  with  the  estimation  variables  was 
significant,  both  X  (4,  53)  <  1.3C,  p  >  0.29. 
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In  general,  then,  the  results  support  the  hypothesis  that 
requiring  an  overt  estimate  of  cue  values  enhances  both  the  use 
of  those  cues  in  aggregate  judgment  and  the  overall  quality  of 
the  threat  assessment.  The  fact  that  automatic  tabulation  of 
events  provided  little  additional  benefit  over  self-tabulation 
may  be  attributable  to  a  ceiling  effect.  Subjects  in  the 
estimation  groups  made  correct  frequency  categorizations  on  92% 
of  the  problems  in  both  self  and  automatic  tabulation  conditons. 
Apparently,  as  long  as  events  were  preserved  on  the  CRT,  as  they 
were  in  both  aiding  conditions,  it  did  not  matter  whether  the 
subject  had  to  emit  responses  to  preserve  those  events.  It  is 
worth  noting,  however,  that  even  at  this  "ceiling"  level,  overt 
estimation  enhanced  the  ultimate  diagnosis.  The  second 
experiment  included  more  demanding  conditions. 

EXPERIMENT  2 

Our  purpose  of  this  study  was  to  determine  the  replicability 
of  the  estimation  effect  found  in  Experiment  1  under  a  wider 
range  of  aiding  and  difficulty  conditions.  Another  was  to  extend 
aiding  to  the  point  of  actually  calculating  cue  values  (citing 
frequencies)  as  is  typical  in  policy-capturing  research.  With 

these  added  conditions  it  was  possible  to  compare  threat 

i 

evaluation  (diagnosis)  performance  based  on  raw  observations  with 
that  for  partially  and  fully  processed  predictive  data  as 
discussed  in  the  Introduction.  The  expectation  was  that  aiding 
would  help,  but  that  zr.e  estimation  requirement  vouid  serve  mucn 
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the  same  purpose  under  conditions  conducive  to  accurate  readiness 
estimation.  Under  more  difficult  estimation  conditions,  of 
course,  the  relative  effectiveness  of  the  estimation  requirement 
should  decline  since  the  overall  threat  assessment  would  be  based 
on  less  accurate  "cue  values". 

Method 

The  basic  design  replicated  the  automatic  tabulation  condition 
of  Experiment  1  and  added  two  levels  of  aiding  --  1)  the  direct 
computation  of  citing  frequencies,  and  2)  an  unaided  condition  in 
which  subjects  had  to  deal  with  observed  events  in  real  time. 

Thus  it  consisted  of  six  groups  obtained  by  crossing  the 
estimation  variable  (two  levels)  with  aiding  conditions  (unaided, 
tabulation,  and  computation).  The  difficulty  variable  was 
manipulated  within  subjects  by  using  two  levels  of  input  rate: 
the  easier  condition,  3  min.  per  problem,  was  consistent  with 
Experiment  1;  the  more  difficult,  30  sec.  per  problem,  was  chosen 
to  eliminate  any  possibility  of  actually  counting  the  citings. 

The  difficulty  variable  was  applied  only  to  the  unaided  and 
tabulat ion  groups  since  the  computat ion  groups  did  not  actually 
observe  citings  (so  as  to  ensure  that  judgments  were  based 
exclusively  on  the  processed  cue  values).  Therefore,  there  were 
actually  two  designs:  a  3  X  2  between-groups  factorial  with 
estimation  di f f iculty  collapsed,  and  a  2  X  2  X  2  mixed  factorial 
with  the  computat ion  conditions  omitted. 

The  only  other  noteworthy  differences  in  methodology  between 
this  study  and  the  previous  me  were  a  siignt  increase  in  the 
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citing  frequencies  (a  maximum  of  20  rather  than  14  in  each  region 
per  problem)  and  a  corresponding  adjustment  in  the  activity-level 
ranges  (normal  readiness  was  0-6  citings  per  region  with  a  mean 
of  3;  moderate,  7-13  with  a  mean  of  10;  high,  14-20  with  a  mean 
of  17).  Since  the  rate  of  citings  was  varied  within  subjects  in 
the  unaided  and  tabulation  conditions,  order  effects  were 
controlled  by  randomizing  the  presentation  of  slow  and  fast 
problems  separately  for  each  subject. 

Results  and  Discussion 

The  data  for  overall  quality  of  threat  evaluations,  again 
expressed  in  terms  of  mean  correlations  between  obtained  and 
optimal  ratings,  are  summarized  in  Table  3. 


Insert  Table  3  about  here 


As  predicted,  performance  improved  systematically  with  level  of 
aiding  in  the  absence  of  any  overt  cue  estimation  requirement, 
but  estimation  alone  produced  substantial  gains  as  well  (fromx  = 
0.47  to  0.79).  In  fact,  the  0.79  compares  favorably  with  the 
average  for  all  aided  conditions,  which  was  0.84.  The 
combination  of  aiding  and  estimation,  however,  added  very  little 
to  either  alone.  Threat  evaluation  performance  was  not 
significantly  different  among  the  three  groups  who  estimated 
readiness.  Further,  no  sigificant  differences  in  aiding 
(collapsed  over  tabulation  and  computation)  appeared  when  threat 
evaluation  was  preceeded  by  readiness  (cue)  estimation  i?  <  1). 
However,  subjects  in  the  tabulation/ estimation  group  had 
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significantly  higher  correlations  than  those  who  had  events 
tabulated  but  did  not  estimate  readiness. 

The  above  conclusions  are  supported  by  a  highly  significant 
estimation  X  aiding  interaction,  _F  (2,  84)  =  23.50,  _p  <  0.0001, 
and  by  post  hoc  comparisons  of  estimation  with  no-estimation 
means: _F  (1,  84)  =  66.36,  _p  <  0.0001  for  the  unaided  condition; 
4.82,  _p  <  0.05  for  the  tabulation  condition;  and  2.64,  _p  >  0.10 
for  the  computation  condition  (reversed  effect).  The  above 
conclusions  are  also  supported  by  post  hoc  comparisons  of  means 
from  the  three  aiding  conditions:  _F<2,  84)  *  133.64,  jo  <  0.0001, 
for  the  no-estimation  condition;  JM2,  84)  =  2.73, _p  >  0.05,  for 
the  estimation  condition.  Regarding  the  nonsignif icant 
estimation  effect  for  computation  groups,  it  should  be  noted  that 
the  only  readiness  estimation  involved  in  the  computation 
condition  was  classifying  the  presented  citing-frequency  numbers 
into  the  proper  readiness  ranges.  Despite  the  simplicity  of  this 
requirement,  accuracy  was  not  perfect  (98%),  which  probably 
accounts  for  the  nonsignificant  decrement  with  estimation. 

The  difficulty  variable  apparently  did  not  affect  the  quality 
of  threat  evaluations  of  unaided  and  tabulation  groups.  The 
difficulty  effect  was  not  statistically  signi f icant ,  _F  (1,  56)  = 
2.23, _p  >  0.14;  nor  were  any  interactions  of  difficulty  with  the 
between-groups  variables.  However,  the  estimation  X  aiding 
interaction  was  highly  significant,  thus  substantiating  the 
results  of  the  between-qrcups  analyses,  F.  (  1,  56)  =  26.69,  _o  < 


0.0001.  In  Table  4  it  appears  that  while  estimating  readiness 
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improved  threat  evaluations  of  both  unaided  and  tabulation 
groups,  it  helped  the  unaided  group  considerably  more. 


Insert  Table  4  about  here 


In  contrast  to  the  overall  threat  judgment,  the  accuracy  of 
aided  readiness  estimates  was  unaffected  by  difficulty  (95%  for 
fast  vs.  98%  for  slow  conditions).  However,  the  mean  difference 
between  unaided  estimates  for  fast  and  slow  conditions  (76%  vs. 
93%,  respectively)  was  significant,  Jfc.  (14)  =  8.53,  jo  <  0.0001. 

The  fact  that  unaided  subjects  maintain  accurate  threat 
evaluations  even  when  their  readiness  estimates  are  inaccurate 
constitutes  rather  definitive  substantiation  of  the  estimation 
effect.  Even  fairly  inaccurate  cue  value  estimates  can  lead  to 
improved  threat  evaluations. 

In  sum,  the  results  of  these  analyses  indicate  that  when 
decision  makers  are  forced  to  make  overall  criterion  judgments 
(threat  evaluation)  intuitively  on  the  basis  of  events  observed 
in  "real  time",  their  performance  can  be  improved  markedly  by 
interposing  a  processing  step  (cue  estimation).  However,  if  this 
processing  is  done  automatically,  permitting  a  more  "analytic" 
approach  to  threat  judgment,  performance  improves  and  the 
redundant  estimation  step  is  not  helpful.  If  the  event 
occurrences  are  merely  preserved  but  not  processed,  estimation  is 
again  helpful,  but  to  an  extent  midway  between  the  raw- 
observation  and  me  automatic  processing  conditions. 
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The  above  results  are  strengthened  even  further  by  the  process 
(b-weight)  measures  as  shown  in  Table  5. 


Insert  Table  5  about  here 


Separate  MANOVAS  for  the  three  aiding  groups  yielded  a 
significant  estimation  effect  only  in  the  unaided  condition:  the 
b-weights  obtained  with  an  estimation  step  were  distributed  more 
optimally  than  those  obtained  without  one  in  this  completely 
manual  condition,  _F  (4,  25)  =  2.93,  jd  <  0.04.  While  neither  of 
the  aided  conditions  yielded  a  significant  estimation  difference, 
_£  <  1.0,  the  trend  under  the  tabulat ion  condition  was  in  the  same 
direction  as  that  for  the  unaided  condition.  It  will  be  recalled 
that  this  trend  also  was  apparent  for  estimation  groups  in 
Experiment  1.  In  particular  both  unaided  and  tabulation  groups 
that  are  required  to  make  intervening  estimates  of  cue  values 
tend  to  employ  all  of  the  cues  in  their  overall  threat  judgments, 
whereas  groups  that  do  not  estimate  cue  values  tend  to  ignore  all 
but  the  most  predictive  cue. 

In  the  repeated-measures  (i.e.  2  X  2  X  2)  MANOVA,  the 
difficulty  variable  had  a  significant  main  effect  on  the 
distribution  of  b-weights,  X  (4,53)  *  2.78,  _p  <  0.04,  as  did  its 
interaction  with  the  other  two  variables,  X  (4,  53)  *  2.68,  _p  < 
0.04.  Since  the  principal  reason  for  this  interaction  appears  to 
have  been  a  poor  distribution  of  weights  under  the  unaided,  no¬ 
estimation  condition,  the  results  are  consistent  with  the 
conclusion  that  estimation  helps  most  when  conditions  are 
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otherwise  not  very  conducive  to  judgment.  Surprisingly,  this  is 
true  even  though  the  estimated  cue  values  under  the  unaided 
condition  were  12%  less  accurate,  on  the  average,  than  under  any 
of  the  aided  conditions. 

CONCLUSION 

The  two  studies  reported  here  offer  strong  support  for  the 
proposition  that  higher-order,  integrative  judgments  (threat 
diagnosis)  benefit  from  the  explicit  "processing"  of  lower-order 
information  whether  carried  out  manually  or  through  machine 
aiding.  Conversely,  and  perhaps  more  importantly,  serious 
deficiencies  in  the  quality  of  diagnostic  judgments  are  likely  if 
the  human  decision  maker  draws  inferences  directly  from  a  stream 
of  "raw"  observations.  In  such  situations,  he/she  tends  to  limit 
consideration  to  the  most  predictive  items,  virtually  ignoring 
lesser  —  yet  still  very  useful  —  cues. 

The  tendency  toward  overselection  in  the  use  of  diagnostic 
evidence  has,  of  course,  been  reported  before  in  other  contexts 
(e.  g.,  Nisbett  &  Ross,  1980).  The  typical  explanation  is  that 
it  represents  a  means  of  coping  with  information  overload,  a 
somewhat  adaptive  mechanism  whereby  the  human  compensates  for  his 
limited  capacity  by  simplifying  the  environment  (and  perhaps 
losing  some  predictive  power  in  the  process).  Neither 
information  overload  nor  capacity  limits,  however,  seem  to 
account  entirely  for  the  present  results.  The  estimation 
requirement  added  to,  rather  than  subtracted  from,  the  overall 
task  demands,  yet  it  produced  a  consistent  improvement  in  unaided 
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performance  even  when  the  estimated  values  were  not  very 
accurate.  Similarly,  increasing  the  burden  further  by  speeding 
up  the  input  rate  only  enhanced  the  value  of  the  estimation 
requirement  (although,  of  course,  it  detracted  somewhat  from 
overall  performance). 

A  more  plausible  explanation  in  the  present  case  is  that  both 
the  estimation  requirement  and  machine  aiding  served  to  cast  the 
predictive  information  into  a  form  that  was  conducive  to 
integration  (increasing,  in  a  sense,  its  compatabi 1 i ty  with  the 
required  cognitive  operations).  Such  pre-processing  presumably 
did  simplify  the  ultimate  integration  step,  but  in  a  way  that 
encouraged  preserving  rather  than  discarding  predictive 
information.  The  important  point  is  that  without  an  explicit 
pre-processing  step,  subjects  tended  to  simplify  in  other,  less 
productive  ways  (overselection). 

The  results  support  Hammond's  (1980)  thesis  that  congruence 
between  the  decision  maker's  mode  of  cognition  and  the  mode  of 
processing  induced  by  the  task  characteristics  yields  the  most 
nearly  optimal  judgments.  The  nature  of  the  threat  evaluation 
task  was  such  that  it  could  be  performed  most  optimally  in  an 
analytical  framework.  When  cue  estimation  helped  to  provide  that 
framework  (by  transforming  real-time  events  into  cue  values),  the 
decision  maker's  performance  improved.  The  manual  processing  of 
cues  may  have  shifted  the  decision  maker  from  an  intuitive  to  an 
analytical  mode  of  processing.  However,  when  an  analytical 
framework  was  inherent  in  the  task  itself  (through  the  automated 
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pre-processing  of  cues),  cue  estimation  was  not  helpful. 

Finally,  when  the  aiding  condition  provided  a  framework  midway 
between  raw  events  and  pre-processed  cue  values,  estimation  was 
helpful  to  an  extent  midway  between  the  raw-observation  and 
automatic-processing  conditions. 

From  a  practical  standpoint,  the  present  results  have  two 
major  implications.  First,  one  cannot  assume  that  the  weighting 
strategies  revealed  through  the  typical  policy-capturing  study 
apply  to  "unprocessed"  predictive  data.  Structuring  the  problem 
so  as  to  provide  the  "judge"  with  explicit  "cue  values"  dictates 
to  an  extent  how  he/she  will  integrate  those  cues. 

Secondly,  one  does  not  have  to  incorporate  machine  aiding  into 
the  system  in  order  to  realize  some  of  the  benefits  from 
structuring  or  pre-processing  a  stream  of  predictive  evidence. 

The  pre-processing  can  be  done  manually.  This  could  be  an 
important  consideration  in  situations  that,  for  one  reason  or 
another,  preclude  automated  processing.  The  fact  that  merely 
requiring  an  estimation  step  can  markedly  enhance  diagnostic 
judgment  provides  the  system  designer  with  a  useful  alternative. 
Of  course,  the  present  work  is  only  a  beginning;  much  remains  to 
be  learned  about  the  influence  of  various  forms  of  pre-processing 
on  various  kinds  of  subsequent  judgments  and  decisions.  We  have 
examined  but  one  set  of  processes  in  a  fairly  simple  task 
setting.  However,  finding  the  pronounced  effects  that  we  did  in 
even  this  limited  context  suggests  that  the  approach  is  well 
worth  pursuing  into  other,  more  complex,  task  domains.  A 
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specific  question  in  need  of  an  answer  is  how  far  accuracy  of 
manual  pre-processing  can  decline  before  the  advantage  of  that 
pre-processing  is  offset  by  the  poor  quality  of  the  resulting 
cues. 
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Table  1 
Experiment  1 

Mean  Correlations  Between  Actual  and  Optimal  Threat  Assessments 

Aiding 

Automatic  Tabulation  Self  Tabulation 


Group 

M 

SD 

M 

SD 

Estimat ion 

.82 

.09 

.82 

.  15 

No  Estimation 

.  75 

.21 

.74 

.  12 

t 
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Table  2 
Experiment  1 

Mean  B-Weights  of  Each  Region  For  the  Estimation 
and  No  Estimation  Group 
Region 

12  3  4 


Group 

M 

SD 

M 

SD 

M 

SD 

M 

SD 

Est . 

5.41 

1.31 

3.14 

1.24 

1.48 

1.60 

0.82 

1.07 

No  Est. 

5.00 

1.72 

2.40 

1.24 

0.87 

1.54 

0.66 

1.61 

Optimal 

4. 

00 

3. 

00 

2. 

00 

1. 

00 
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Table  4 
Experiment  2 

Mean  Correlations  Between  Actual/Optimal  Threat  Assessments 

Aiding 

None  Tabulation 


Fast 

Rate 

Slow 

Rate 

Fast 

Rate 

Slow 

Rate 

Group 

M 

SD 

M 

SD 

M 

SD 

M 

SD 

Est. 

.77 

.10 

.82 

.10 

.83 

.11 

.  90 

.  06 

No  Est. 

.52 

.19 

.50 

.25 

.78 

.12 

.80 

.11 
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Table  5 
Experiment  2 

Mean  B-Weights  For  Each  Region  For  the  Different 
Aiding  and  Estimation  Groups 
Unaided  Group 


Region  1 

Region  2 

Region 

3 

Reg  ii 

on 

4 

Group 

M 

SD 

M 

SD 

M 

SD 

M 

SD 

Est . 

5.51 

1.42 

2.65 

1.50 

0.73  1 

.08 

0.31 

1 

.  36 

No  Est. 

4.12 

2.03 

1.52 

1.80 

-.07  2 

.09 

-.88 

1 

.74 

Opt imal 

4. 

00 

3. 

00 

2.00 

1 

.00 

Tabulation 

Group 

Region  1 

Region  2 

Region 

3 

Region 

4 

Group 

M 

SD 

M 

SD 

M 

SD 

M 

SD 

Est . 

5.48 

2.21 

3.12 

1.18 

1.20  1 

.61 

.38 

1 

.20 

No  Est . 

4.67 

2.06 

2.70 

1.02 

1.14  1 

.40 

.05 

2 

.39 

Optimal 

4. 

00 

3. 

00 

2.00 

1 

.00 

Computation 

i  Group 

Region  1 

Region  2 

Region 

3 

Regi' 

on 

4 

Group 

M 

SD 

M 

SD 

M 

SD 

M 

SD 

Est. 

5.64 

1.34 

3.16 

1.01 

1.00  1 

.07 

.  78 

1 

.39 

NO  Est. 

6.01 

1.31 

3.51 

1.26 

1.01  0 

.95 

.33 

0 

.88 

Opt imal 

4. 

00 

3. 

00 

2.00 

1 

.00 
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APPENDIX 


Table  1 
Experiment  2 

Mean  B-Weights  For  Each  Region  For  the  Different 
Aiding,  Estimation,  and  Difficulty  (Fast  Rate  vs.  Slow  Rate) 

Conditions 


Unaided 

Group 

-  Fast 

Rate 

Region  1 

Region  2 

Region  3 

Region  4 

Group 

M  SD 

M 

SD 

M 

SD 

M  SD 

Est . 

4.96  1.58 

2.12 

2.08 

0.84 

1.34 

0.66  2.05 

No  Est. 

3.83  3.27 

1.75 

2.18 

0.34 

3.21 

-1.94  2.52 

Optimal 

4.00 

3 

.00 

2. 

00 

1.00 

Unaided 

Group 

-  Slow 

Rate 

Region  1 

Region  2 

Region  3 

Region  4 

Group 

M  SD 

M 

SD 

M 

SD 

M  SD 

Est. 

6.03  1.66 

3.19 

1.23 

0.73 

1.33 

0.10  1.54 

No  Est. 

4.13  2.20 

1.28 

1.97 

-0.47 

2.04 

0.18  2.21 

Optimal 

4.00 

3 

.00 

2. 

00 

1.00 

Tabulation  Group  -  Fast  Rate 
Region  1  Region  2  Region  3  Region  4 


Group 

M 

SD 

M 

SD 

M 

SD 

M 

SD 

Est . 

5.62 

2.47 

2.63 

1.37 

0.77 

1.92 

0.25 

1 . 43 

No  Est. 

4.94 

1.93 

2.18 

1.08 

0.33 

1.68 

0.01 

2.72 

ipt  imal 

4. 

00 

3 

.00 

2. 

00 

1 

4.  « 

00 
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Tabulation 

Group 

-  Slow  Rate 

Region  1 

Region 

2 

Region  3 

Group 

M  SD 

M 

SD 

M  SD 

Est. 

4.95  2.48 

3.62 

1.27 

1.62  1.74 

No  Est. 

4.39  2.80 

3.22 

1.54 

1.40  1.56 

Optimal 

4.00 

3.00 

2.00 

Region  4 
M  SD 
0.51  1.17 

0.03  2.57 

1.00 
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